Lecture 6

Lecture 6 - Statistical Surfaces

OUTLINE

I. Data Sources

II. Representing Surfaces

III. Methods of Interpolation (point)

IV. Methods of Interpolation (areal)

Definition

· Digital Elevation Model: A model of the continuous variation of relief over a geographic area.

While these surfaces show variation about three data axes (x,y,z), they are not considered 3-D representations which is reserved for data that varies continuously through a 3-D framework (Burrough and McDonnell, 1998)

I) Data Sources

· Stereo aerial photos or satellite images

· Point samples measured directly using GPS using a specific sampling design.

· Digitized topographic maps

II) Representing Surfaces

A) Isolines (contours):

· elevation models can be represented as contours connecting points of equal statistical value. This representation is very suitable for display purposes but is ill suited for numerical analysis or modeling.

B) Mathematical models:

· high degree of complexity, not well suited for general application (fourier analysis, high order polynomials).

C) Point models

· most widely used models suitable for both display and numerical analysis/modeling. Uniform sampling or regular lattice sampling is based on a regular grid . Addaptive sampling, or irregular lattice sampling results when points are collected base on the variability of the surface, the more vaiability the more points collected.

D) Altitude matrix

· surface represented by elevation values at uniform intervals.

· most common representation for a DEM.

· well suited for contour, slope, aspect, shading, basin delineation.

· problems with data redundancy, fixed cell size, cross axis calculations.

E) Triangulated Irregular Network (TIN)

uses sheets of continuous, connected triangular facets based on a Delaunay triangulation of irregularly spaced (generated from progressive sampling) nodes or observations.

· vector topological structure, similar to that for defining polygons.

· altitude and XY in nodes

· reference triangle for other calculations such as slope and aspect.

· avoids redundancy of altitude matrix.

· more efficient calculation of slope.

· remnant effects.

III) Methods of Interpolation (point)

Interpolation:

· the procedure of estimating the value of properties at unsampled sites within an area covered by existing point observations.

Extrapolation:

· estimating the value of a property at sites outside the area covered by existing observations.

Principle

· points close together in space are more likely to have similar values of a property of interest than points further apart.

Assumption

· The assumption in using one interpolation procedure verses another, as they relate to the data being interpolated, must be clearly stated. Does the interpolation procedure make sense? What factors must be considered when performing interpolations?

The Challenge of Interpolation

Finding a plausible model to suit the phenomena being modeled.

A) Discrete Techniques

1) Thiessen Polygons - use closest samples to determine the value of a given point by taking the perpendicular bisectors between points. Problem: size and shape of areas depend on sampling design, value of interest is estimated by a sample of one, and distance from point not factored into the interpolation.

B) Continuous Methods

1) Global Methods

Trend surface analysis:

· Polynomial fit of points to approximate the surface. Method is often used to remove broad features prior to using some other local interpolator.

· Example:

	1st degree	2nd degree	3rd degree
1 Ind. variable	y = b_o + b₁X₁	y = b_o + b₁X₁ + b₂X²₁	y = b_o + b₁X₁ + b₂X²₁+ b₃X³₁
2 Ind. variable	y = b_o + b₁X₁ + b₂X₂	y = b_o + b₁X₁ + b₂X₂+ b₃X²₁+ b₄X²₂+ b₅X₁X₂	y = b_o + b₁X₁ + b₂X₂ + b₃X²₁+b₄X²₂+b₅X₁X₂ + b₆X³₁+ b₇X³₂+ b₈X₁X₂² + b₉X₂X₁²
3 Ind. variable	y = b_o + b₁X₁ + b₂X₂ + b₃X₂ + b₄X₃	y = b_o + b₁X₁ + b₂X₂+ b₃X²₁+ b₄X²₂+ b₅X₁X₂+ b₆X₃+ b₇X²₃+ b₈X₁X₃+ b₉X₁X₂	y = b_o + b₁X₁ + b₂X₂ + b₃X₃ b₄X²₁+b₅X²₂+b₆X²₃+b₅X₁X₂ + b₆X₁X₃+ b₇X₂X₃ + b₈X³₁+ b₉X³₂+ b₁₀X³₃+ b+ b₁₃X²₁X₂²₁₁X₁X₂² + b₁₂X₂X₁² + b₁₃X²₁X₂² + b₁₄X²₂X₁² + ............

· Problems : very susceptible to outliers, hard to ascribe physical meaning to higher polynomials.

· Fourier series: linear combination of sine and cosine waves. Better for analysis of periodic functions.

2) Local Interpolators

· Splines: piece-wise polynomial fit which ensure that joins between pieces are smooth. May vary as a function of the location of the break point

· Moving averages: a smoothing technique that computes the average value from a local neighborhood. Can be weighted as a function of distance to give a weighted moving average. Factors to consider:

· size of neighborhood.

· shape of neighborhood.

· minimum number of points.

· location of points.

· shape of weight function.

· z(x_o) = Sum_{i = 1,n}(z(x_i) * wt_ij))/ Sum_{i
= 1,n} wt_ij (equation 6.1)

z(x_o) = interpolated value at location j
z(x_i) = value at data point
wt_ij= distance between unknown point x_j and data point x_I
å is i = 1 to n, where n is the number of data points

wt_ij can equal 1 (for an average), or 1/dⁿ

where d is the distance between x_i and x_j

Problems: maxima and minima can occur only at data points. No built-in method for determining the quality of the interpolation. Duck-egg problem, pattern around solitary points with great difference with surrounding.

3) Kriging (Global & Local):

Optimal method using spatial autocovariance. Based on theory of regionalized variables. Spatial variation is expressed by a structural component associated with a constant mean value or trend, a random, a spatially correlated component, and a random noise or residual error. Semivariogram is used to determine the distance of autocorrelation, its direction and the appropriate weighting scheme. Plots semi-variance with respect to lag (distance).

Z(x) = m(x) + e'(x) + e'' (equation 6.2)

M (x) is a deterministic function which describes the structural component
e'(x) is the stochastic, locally varying but spatially dependent component called the regionalized variable
e'' is a residual, spatially independent

Characteristics

· exact or perfect interpolator: surface passes through all points whose values are known.

· provides a measure of statistical error.

· minimizes the variance.

· unbiased estimator.

First step is to decide on the appropriate function for m(x). If it is assumed that there is no trend then the mean value in the sampling area is used.

Calculate e'(x) using the semi-variance g as an estimate.

g (h) = 1/2n Sum_{i = 1,n} {z(x_i) - z(x_i- h)}²
n = number of sample points at distance h
z(x_i) - z(x_i- h)² = the sum of squares of all sample points at distance h a part

Variogram which plots g (h) as a function of the Lag(h) can be created.

The variogram provides information about the size of the search window and the shape of the correlation function.

Its relationship to the variance.

4) Knox Methodology (space-time interaction model)

The Knox analysis consists of pairing the data points and then evaluating whether pairs of data are found close in space and time.

A statistical test determines whether the number of close pairs sufficiently deviates from the number of close space-close time pairs due to a

random process. Knox (1964) suggests the construction of a 2 x 2 contingency table as follows:

Data Points

	Space
Time		Close	Not Close
	Close	Close T(o₁₁)	Time Only (o₁₂)
	Not Close	Space Only(o₂₁)	Not Close(o₂₂)

The test statistic T is described as:

where,

n is the number of data points;

s is 1 if the i^thj^th pair is close in space and zero otherwise;

t is 1 if the i^thj^th pair is close in time and zero otherwise.

There are:

pairs that can be formed from n data points.

The Knox statistic (T) is tested against an expected number of pairs that would be found close in space and time, given that s pairs

were found close in space and t pairs were found close in time.

The expected number of pairs is calculated under the assumption that space is independent of time. This equation reads:

The approximate variance of the Knox statistic, developed by Barton and David (1966) adjusts for pairs sharing data points and reads:

where,

s: the number of pairs found close in space;

t: the number of pairs found close in time;

s₁: the number of pairs close in space sharing one data point;

t₁: the number of pairs close in time sharing one data point;

N: the total number of pairs;

n: the number of birds.

Critical Parameters Selection

The selection of critical parameters, referring to values that are considered ‘close’ in space and time is a challenging task in

the Knox methodology.

Running the Knox Test over a Continuous Surface

A one-half mile grid is overlaid across NYC (global area) and the Knox test is run on the centroid of each grid cell. The use of a buffer

around each centroid ensures an overlapping coverage of NYC. The critical parameters are set and the local significance for each

cell centroid is assessed.

West Nile Virus Example

Map 1 Map 2 Map 3 Map 4 Map 5

Birds.avi

IV) Methods of Interpolation (areal)

1) Overlay

· overlay of target and source zones

· determine the proportion of each source zone assigned to each target zone

· apportion the total value of the attribute for each source zone to target zones according to areal proportions.